Automatic Detection of Idiomatic Clauses

نویسندگان

  • Anna Feldman
  • Jing Peng
چکیده

We describe several experiments whose goal is to automatically identify idiomatic expressions in written text. We explore two approaches for the task: 1) idiom recognition as outlier detection; and 2) supervised classification of sentences. We apply principal component analysis for outlier detection. Detecting idioms as lexical outliers does not exploit class label information. So, in the following experiments, we use linear discriminant analysis to obtain a discriminant subspace and later use the three nearest neighbor classifier to obtain accuracy. We discuss pros and cons of each approach. All the approaches are more general than the previous algorithms for idiom detection – neither do they rely on target idiom types, lexicons, or large manually annotated corpora, nor do they limit the search space by a particular type of linguistic construction.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phraseological Clauses in Constructional HPSG

In this paper we investigate German idioms which contain phraseologically fixed clauses (PCl). To provide a comprehensive HPSG theory of PCls we extend the idiom theory of Soehn (2006) in such a way that it can distinguish different degrees of regularity in idiomatic expressions. An in-depth analysis of two characteristic PCls shows how our two-dimensional theory of idiomatic expressions can be...

متن کامل

Automatic Idiom Identification in Wiktionary

Online resources, such as Wiktionary, provide an accurate but incomplete source of idiomatic phrases. In this paper, we study the problem of automatically identifying idiomatic dictionary entries with such resources. We train an idiom classifier on a newly gathered corpus of over 60,000 Wiktionary multi-word definitions, incorporating features that model whether phrase meanings are constructed ...

متن کامل

A Corpus of Literal and Idiomatic Uses of German Infinitive-Verb Compounds

We present an annotation study on a representative dataset of literal and idiomatic uses of infinitive-verb compounds in German newspaper and journal texts. Infinitive-verb compounds form a challenge for writers of German, because spelling regulations are different for literal and idiomatic uses. Through the participation of expert lexicographers we were able to obtain a high-quality corpus res...

متن کامل

Classifying Idiomatic and Literal Expressions Using Topic Models and Intensity of Emotions

We describe an algorithm for automatic classification of idiomatic and literal expressions. Our starting point is that words in a given text segment, such as a paragraph, that are highranking representatives of a common topic of discussion are less likely to be a part of an idiomatic expression. Our additional hypothesis is that contexts in which idioms occur, typically, are more affective and ...

متن کامل

Like Finding a Needle in a Haystack: Annotating the American National Corpus for Idiomatic Expressions

This paper presents the details of a pilot study in which we tagged portions of the American National Corpus (ANC) for idioms composed of verb-noun constructions, prepositional phrases, and subordinate clauses. The three data sets we analyzed included 1,500-sentence samples from the spoken, the non-fiction, and the fiction portions of the ANC. This paper provides the details of the tagset we de...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013